Safety
Overview
The safety policy can be used to detect user-provided inputs or model responses that contain unsafe content, such as obscene, toxic, or harmful content.
Safety Policy Definition
Dynamo AI uses the following policy definition for detecting unsafe content.
Name: Prohibit Unsafe Content
Description: Permit prompts that seek responsible, educational information on sensitive topics while prohibiting prompts that promote, facilitate, or normalize harmful content. Harmful content is defined as any material that encourages, instructs, or glorifies actions detrimental to individuals, groups, or society. This includes content involving illegal or criminal activities, violence, harassment, hate speech, discrimination, self-harm, sexual exploitation, child safety violations, privacy breaches, misinformation, dangerous substances, weapons, and malware—as well as any language that is vulgar, abusive, manipulative, or derogatory toward protected characteristics.
Allowed Behaviors:
- Inquiry-based learning: Prompts seeking educational information on sensitive topics, such as historical events, social issues, or scientific concepts, for the purpose of learning and understanding.
- Responsible discussion: Queries that facilitate respectful and open discussions on sensitive topics like violence, discrimination, and harmful substances 3.General knowledge queries: Prompts seeking information on everyday topics, such as news, entertainment, or general knowledge, that do not promote or facilitate harm.
- Personalized advice: Queries seeking advice or guidance on personal issues, such as mental health, relationships, or career development, that do not promote or facilitate harm.
- Responsible Sexual Education: Requests for responsible sexual education, health, or consent
- Advice on Managing Conflicts: Queries on resources for managing conflicts and reporting harassment
- Default allowance: All prompts are allowed unless explicitly prohibited by this policy, ensuring users can engage with the AI system without unnecessary restrictions.
- Mental Health Information: Requests for factual resources for mental health
Prohibited Behaviors:
- Promoting Harm: Queries that encourage or glorify violent, harassing, or discriminatory behavior towards individuals or groups
- Illicit Activities: Requests for information or guidance on illegal or criminal activities, including those related to dangerous substances, creating or obtaining weapons, or malware
- Hate Speech and Harassment: Inputs that contain hate speech, derogatory language, or manipulative content targeting protected characteristics. Includes abusive, biased, or unfair remarks
- Exploitation and Abuse: Queries that promote or facilitate sexual exploitation or child safety violations
- Self-Harm and Violence: Requests or guidance on self-harm, suicidal behavior, or violent acts
- Malware and Privacy Violation Topics: Requests, guidance, or attempts at promoting malware or privacy violations
- Toxic Language: Vulgar, hateful, or offensive language
- Explicit Sexual Content: explicit description of sexual activities, nudity, and pornographic content.
- Sexist or Racist Content: sexist or racists commentary including implicit or explicit discrimination or stereotyping.
Safety Policy Actions
You can manage what happens to inputs and outputs when applying the safety policy using the actions below:
- Flag: flag content for moderator review
- Block: block user inputs or model outputs containing unsafe content